Randomized Algorithms for Tracking Distributed Count, 

Frequencies, and Ranks 



Zengfeng Huang Ke Yi Qin Zhang 

Hong Kong University of Science and Technology MADALGO, University of Aarhus 
{huangzf, yike}@cse.ust.hk qinzhang@cs.au.dk 

December 5, 2011 



O 

<N 

O 

Abstract 

Q 

We show that randomization can lead to significant improvements for a few fundamental problems 
in distributed tracking. Our basis is the count-tracking problem, where there are k players, each holding 
i— i a counter n, that gets incremented over time, and the goal is to track an e-approximation of their sum 

n = rti continuously at all times, using minimum communication. While the deterministic commu- 
£^ nication complexity of the problem is 0(fc/e • log N), where N is the final value of n when the tracking 

^ finishes, we show that with randomization, the communication cost can be reduced to 8(v / &/e ■ log N). 

O Our algorithm is simple and uses only 0(1) space at each player, while the lower bound holds even 

assuming each player has infinite computing power. Then, we extend our techniques to two related dis- 
£S) tributed tracking problems: frequency-tracking and rank-tracking, and obtain similar improvements over 

t> previous deterministic algorithms. Both problems are of central importance in large data monitoring and 

analysis, and have been extensively studied in the literature. 

1 Introduction 

oo 

We start with a very basic problem in distributed tracking, what we call count-tracking. There are k players 
each holding a counter that is initially 0. Over time, the counters get incremented and we denote by 
rii(t) the value of the counter rij at time t. The goal is to track an e-approximation of the total count 
n(t) = J2i n i{t)> i- e -> an such that (1 — e)n(t) < n(t) < (1 + e)n{t)} continuously at all times. There 
is a coordinator whose job is to maintain such an h(t), and will try to do so using minimum communication 
with the k players (the formal model of computation will be defined shortly). 

There is a trivial solution to the count-tracking problem: Every time a counter rtj has increased by a 
1 + e factor, the player informs the coordinator of the change. Thus, the coordinator always has an s- 
approximation of every m, hence an e-approximation of their sum n. Letting N denote the final value of 
n, simple analysis shows that the communication cost of this algorithm is 0(k/e ■ log iV) 2 . This algorithm 
was actually used in [16] for solving essentially the same problem, which also provided many practical 
motivations for studying this problem. Note that this algorithm is deterministic and only uses one-way 
communication (from the players to the coordinator), and yet it turns out this simple algorithm is already 



1 We sometimes omit "(t)" when the context is clear. 

2 A more careful analysis leads to a slightly better bound of 0(k/e ■ log(eA r /fc)), but we will assume that N is sufficiently large, 



compared to k and 1/e, to simplify the bounds. 
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[29] 
new 
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O(l) 


0(l/e 2 -logiV) 



Table 1: Space and communication costs of previous and new algorithms. We assume k < l/e 2 . All upper 
bounds are in terms of words. *This is conditioned upon the communication cost being 0(Vk/e ■ log iV) 
bits. 



optimal for deterministic algorithms, even if two-way communication is allowed [29]. Thus the immediate 
questions are: What about randomized algorithms that are allowed to fail with a small probability? Is two- 
way communication not useful at all? In this paper, we set out to address these questions, and then move on 
to consider other related distributed tracking problems. 

1.1 The distributed tracking model 

We first give a more formal definition of the computation model that we will work with, which is essentially 
the same as those used in prior work on distributed tracking [2,3,5,6,8,9, 16,23,29]. There are k dis- 
tributed sites Si, . . . , Sk, each receiving a stream of elements over time, possibly at varying rates. Let N be 
the total number of elements in all k streams. We denote by Ai(t) the multiset (bag) of elements received by 
Si up until time t, and let A(t) = [+|^ =1 Ai{t) be the combined data set, where l±J denotes multiset addition. 
There is a coordinator whose job is to maintain (an approximation of) f(A(t)) continuously at all times, 
for a given function / (e.g., f(A(t)) = \A(t)\ for the count-tracking problem above). The coordinator has 
a direct two-way communication channel with each of the sites; note that broadcasting a message costs k 
times the communication for a single message. The sites do not communicate with each other directly, but 
this is not a limitation since they can always pass messages via the coordinator. We assume that communi- 
cation is instant, i.e., no element will arrive until all parties have decided not to send more messages. As in 
prior work, our measures of complexity will be the communication cost and the space used to process each 
stream. Unless otherwise specified, the unit of both measures is a word, and we assume that any integer less 
than N, as well as an element from the stream, can fit in one word. 

This model was initially abstracted from many applied settings, ranging from distributed data monitor- 
ing, wireless sensor networks, to network traffic analysis, and has been extensively studied in the database 
community. From 2008 [8], the model has started to attract interests from the theory community as well, 
as it naturally combines two well-studied models: the data stream model and multi-party communication 
complexity. When there is only k = 1 site who also plays the role of the coordinator, the model degenerates 
to the standard streaming model; when k > 2 and our goal is to do a one-shot computation of /(^4(oo)), 
then the model degenerates to the (number-in-hand) /c-party communication model. Thus, distributed track- 
ing is more general than both models. Meanwhile, it also appears to be significantly different from either, 
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with the above count-tracking problem being the best example. This problem is trivial in both the streaming 
and the communication model (even computing the exact count is trivial), whereas it becomes nontrivial in 
the distributed tracking model and requires new techniques, especially when randomization is allowed, as 
illustrated by our results in this paper. 

Note that there is some work on distributed streaming (see e.g. [10, 11,17, 30]) that adopts a model very 
similar to ours, but with a fundamental difference. In their model there are k streams, each of which runs 
a streaming algorithm on its local data. But the function / on the combined streams is computed only at 
the end or upon requests by the user. As one can see that the count-tracking problem is also trivial in this 
model. The crucial difference is that, in this model, the sites wait passively to get polled. If we want to track 
/ continuously, we have to poll the sites all the time. Whereas in our model, the sites actively participate in 
the tracking protocol to make sure that / is always up-to-date. 

1.2 Problem statements, previous and new results 

In this paper, we first study the count-tracking problem. Then we extend our approach to two related, more 
general problems: frequency-tracking and rank-tracking. Both problems are of central importance in large 
data monitoring and analysis, and have been extensively studied in the literature. In all the communication 
upper bounds, we will assume k < 1/e 2 ; otherwise all of them will carry an extra additive O(klogN) 
term. There are other good reasons to justify this assumption, which we will explain later. All our results 
are summarized in Table 1 ; below we discuss each of them respectively. 

As mentioned earlier, the deterministic communication complexity for the count-tracking problem has 
been settled at Q(k/e ■ log A?") [29] 3 , with or without two-way communication. In this paper, we show that 
with randomization and two-way communication, this is reduced to Q(V~k/ £ ■ log N) . We first in Section 2. 1 
present a randomized algorithm with this communication cost that, at any one given time instance, maintains 
an e-approximation of the current n with a constant probability. The algorithm is very simple and uses 0(1) 
space at each site. It is easy to make the algorithm correct for all time instances and boost the probability 
to 1 — 5: Since we can use the same approximate value n of n until n grows by a 1 + e factor, it suffices to 
make the algorithm correct for 0(log 1+e N) = 0(1/ e- log N) time instances. Then running 0(log( l5 ^)) 
independent copies of the algorithm and taking the median will achieve the goal of tracking n continuously 
at all times, with probability at least 1 — S. The Q(^fk/e- log N) lower bound (Section 2.2) actually holds on 
the number of messages that have to be exchanged, regardless of the message size, and holds even assuming 
the sites have unlimited space and computing power. That randomization is necessary to achieve this \fk- 
factor improvement follows from the previous deterministic lower bound [29]; here in Section 2.2 we give 
an proof that two-way communication is also required. More precisely, we show that any randomized 
algorithm with one-way communication has to use Q(k/e ■ log N) communication, i.e., the same as that for 
deterministic algorithms. 

In the frequency-tracking (a.k.a. heavy hitters tracking) problem, A(t) is a multiset of cardinality n(t) 
at time t. Let fj(t) be the frequency of element j in A(t). The goal is to maintain a data structure from 
which fj(t), for any given j, can be estimated with absolute error at most en(t), with probability at least 
0.9 (say). Note that this problem degenerates to count-tracking when there is only one element. It is 
reasonable to ask for an error in terms of n(t): if the error were efj(t), then every element would have 
to be reported if they were all distinct. In fact, this error requirement is the widely accepted definition 
for the heavy hitters problem, which has been extensively studied in the streaming literature [7]. Several 

3 The lower bound in [29] was stated for the heavy hitters tracking problem, but essentially the same proof works for count- 
tracking. 
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algorithms with the optimal 0(l/e) space exist [18-20]. In the distributed tracking model, we previously 
[29] gave a deterministic algorithm with 0(k/e • log AT) communication, which is the best possible for 
deterministic algorithms. In this paper, by generalizing our count-tracking algorithm, we reduce the cost to 
0(\fkje ■ log N), with randomization (Section 3). Since this problem is more general than count-tracking, 
by the count-tracking lower bound, this is also optimal. Our algorithm uses 0(1/ (eVk)) space to process 
the stream at each site, which is actually smaller than the Q(l/e) space lower bound for this problem 
in the streaming model. This should not come at a surprise: Due to the fact that the site is allowed to 
communicate to the coordinator during the streaming process, the streaming lower bounds do not apply in 
our model. To this end, we prove a new space lower bound of 0(1/ (eVk)) bits for our model, showing that 
our algorithm also uses near-optimal space. This space lower bound is conditioned upon the requirement 
that the communication cost should be 0(\fkje ■ log N) bits. Note that it is not possible to prove a space 
lower bound unconditional of communication: A site can send every element to the coordinator and thus 
only needs 0(1) space. In fact, what we prove is a space-communication trade-off; please see Section 3.2 
for the precise statement. 

For the rank-tracking problem, it will be convenient to assume that the elements are drawn from a 
totally ordered universe and A(t) contains no duplicates. The rank of an element x in A(t) (x may not be 
in A(t)) is the number of elements in A(t) smaller than x, and our goal is to compute a data structure from 
which the rank of any given x can be estimated with error at most en(t), with constant probability. Note 
that a rank-tracking algorithm also solves the frequency-tracking problem (but not vice versa), by turning 
each element x into a pair (x, y) to break all ties and maintaining such a rank-tracking data structure. 
When the frequency of x is desired, we ask for the ranks of (x, 0) and (x, oo) and take the difference. We 
previously [29] gave a deterministic algorithm for the rank-tracking problem with communication 0(k/e • 
log N log 2 (l/e)). In this paper, we show in Section 4 how randomization can bring this down to 0(\fkje ■ 
log N log L5 (l/eV% which is again optimal ignoring polylog( 1 /e,k) factors. Since rank-tracking is more 
general than frequency-tracking, the previous lower bounds also hold here. Our algorithm uses space that is 
also close to the Q(l/(eVk)) lower bound. 

Since we are talking about randomized algorithms with a constant success probability, we should also 
compare with random sampling. It is well known [25] that this probabilistic guarantee can be achieved for 
all the problems above by taking a random sample of size 0(l/e 2 ). A random sample can be maintained 
continuously over distributed streams [9], solving these distributed tracking problems, with a communication 
cost of 0(l/e 2 -logiV). This is worse than our algorithms when k = o(l/e 2 ). As noted earlier, all the upper 
bounds we have mentioned above have a hidden additive 0(k log N) term, including that for the random 
sampling algorithm. Thus when k = £1(1/ e 2 ), all of them boil down to O(klogN), while fi(fe) is an easy 
lower bound for all these problems (see Theorem 2.3). This means that when k = 0.(1 /e 2 ), all problems 
can be solved optimally by just random sampling, up to an 0(log N) factor. Therefore, k = o(l/e 2 ) is the 
more interesting case worthy of studying. In addition, as the error (in particular for the frequency-tracking 
and the rank-tracking problems) is in terms of n, the current size of the entire data set, typical values of e 
are quite small. For example, e = 10~ 2 ~ 10 -4 was used in the experimental study [7] for these problems 
in the streaming model; while k usually ranges from 10 to 10 4 . Thus we will assume k < 1/e 2 in all the 
upper bounds throughout the paper. 

The idea behind all our algorithms is very simple. Instead of deterministic algorithms, we use random- 
ized algorithms that produce unbiased estimators for nj, the frequencies, and ranks with variance (en) 2 /k, 
leading to an overall variance of (en) 2 , which is sufficient to produce an estimate within error en with con- 
stant probability. This means we can afford an error of enj\fk from each site, as opposed toen/k for deter- 
ministic algorithms. This is essentially where we obtain the \/A?-factor improvement by randomization. Our 
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algorithms are simple and extremely lightweight, in particular the count-tracking and frequency-tracking 
algorithms, thus can be easily implemented in power-limited distributed systems like wireless sensor net- 
works. 

1.3 Other related work 

As distributed tracking is closely related to the streaming and the k -party communication model, it could 
be enlightening to compare with the known results of the above problems in these models. As mentioned 
earlier, the count-tracking problem is trivial in both models, requiring 0(1) space in the streaming model 
and 0(k) communication in the /c-party communication model. 

Both the frequency-tracking and rank-tracking problems have been extensively studied in the streaming 
model with a long history. The former was first resolved by the MG algorithm [20] with the optimal space 
0(l/e), though several other algorithms with the same space bound have been proposed later on [18, 19]. 
The rank problem is also one of the earliest problems studied in the streaming model [21]. The best deter- 
ministic algorithm to date is the one by Greenwald and Khana [12]. It uses 0(1/ £ ■ logn) working space 
to maintain a structure of size 0(1/ e), from which any rank can be estimated with error en. Note that the 
rank problem is often studied as the quantiles problem in the literature. Recall that for any < (f> < 1, the 
0-quantile of D is the element in A(t) that ranks at [4>n\ , while an e-approximate 0-quantile is any element 
that ranks between (</> — e)n and (cf> + e)n. Clearly, if we have the data structure for one problem, we can do 
a binary search to solve the other. Thus the two problems are equivalent, for deterministic algorithms. For 
algorithms with probabilistic guarantees, we need all 0(log(l/e)) decisions in the binary search to succeed, 
which requires the failure probability to be lowered by an 0(log(l/e)) factor. By running 0(loglog(l/e)) 
independent copies of the algorithm, this is not a problem. So the two problems differ by at most a factor of 
0(loglog(l/e)). 

The existing streaming algorithms for the frequency and rank problems can be used to solve the one- 
shot version of the problem in the /c-party communication model easily. More precisely, we use a streaming 
algorithm to summarize the data set at each site with a structure of size 0(l/e), and then send the these 
summary structures to the coordinator, resulting in a communication cost of 0(k/e). Recently, we designed 
randomized algorithms for these two problems with 0(V~k/e) communication [13, 14], which have just been 
shown to be near-optimal in an unpublished manuscript [26]. Thus, the results in this paper demonstrate that, 
the seemingly much more challenging tracking problem, which requires us to solve the one-shot problem 
continuously at all times, is only harder by an 0(log N) factor (except for the count-tracking problem, which 
is much harder than its one-shot version). 

Finally, we should mention that all these distributed tracking problems have been studied in the database 
community previously, but mostly using heuristics. Keralapura et al. [16] approached the count-tracking 
problem using prediction models, which do not work under adversarial inputs. Babcock and Olston [3] 
studied the top-fc tracking problem, a variant of the frequency (heavy hitters) tracking problem, but did 
not offer a theoretical analysis. The rank-tracking problem was first studied by Cormode et al. [6]; their 
algorithm has a communication cost of 0(k/e 2 ■ log N) under certain inputs. 

2 Tracking Distributed Count 

2.1 The algorithm 

The algorithm with a fixed p Let p be a parameter to be determined later. For now we will assume that p 
is fixed. The algorithm is very simple: Whenever site Si receives an element (hence rij gets incremented by 
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one), it sends the latest value of rii to the coordinator with probability p. Let rii be the last updated value of 
rii received by the coordinator. We first estimate each rii by 

- _ / n» - 1 + l/p, if rii exists; 

ni " \ 0, else. W 

Then we estimate n as n = J2i n i- 



Analysis As mentioned in the introduction, our analysis will hold for any given one time instance. It is 
also important to note that this given time instance shall not depend on the randomization internal to the 
algorithm. 

We show that each rii is an unbiased estimator of rii with variance at most l/p 2 . This is very intuitive, 
since rii — n-i is the number of failed trials until the site decides to send an update to the coordinator, when 
we look backward from the current time instance. This follows a geometric distribution with parameter p, 
but not quite, as it is bounded by rii. This is why we need to separate the two cases in (1). A more careful 
analysis is given below: 

Lemma 2.1 E[hi] = rii; Var[nj] < l/p 2 . 

Proof. Define the random variable 

_ f rii — n, + 1, if n, exists; 
\ rii + 1 /p, else. 

Now we can rewrite hi as hi = rii — X + l/p. Thus it suffices to show that ELY] = l/p and VarLY] < l/p 2 . 
Letting t = rii — n% + 1, we have 

ELY] = f>(l -pY^p) + (rii + l/p)(l -P) ni = -. 
VarLY] = f> - l/p) 2 (l - pY~ l p) + (rii + l/p ~ l/p) 2 (l - p)" 1 

p2 — p2 

a 



By Lemma 2.1, we know that h is an unbiased estimator of n with variance < k/p 2 . Thus, if p = 
Vk/en, the variance of h will be (en) 2 , which means that h has error at most 2en with probability at least 
3/4, by Chebyshev inequality. Rescaling e and p by a constant will reduce the error to en and improves 
the success probability to 0.9, as desired. Here we also see that separating the two cases in (1) is actually 
important. Otherwise, when n\ = Q(en/Vk), there would be a constant probability that rii does not 
exist, leading to a bias of Q(l/p) = Q(en/\/k). Summing over all k sites, this would exceed our error 
requirement. 

It is interesting to note that similar ideas were used to solve the one-shot quantile problem over dis- 
tributed data [13]. 
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Dealing with a decreasing p It is not possible and necessary to set p exactly to Vk / en. From the analysis 
above, it should be clear that keeping p = @(y/k/en) will suffice. To do so, we first track n within a 
constant factor. This can be done efficiently as follows. Each site S\ keeps track of its own counter rij. 
Whenever m doubles, it sends an update to the coordinator. The coordinator sets n' = Yli=i n 'v where n\ 
is the last update of nj. When n' doubles (more precisely, when n' changes by a factor between 2 and 4), 
the coordinator broadcasts n' to all the sites. Let n be the last broadcast value of n'. It is clear that n is 
always a constant-factor approximation of n. The communication cost is 0(k log N), since each site sends 
0(log N) updates to the coordinator and the coordinator broadcasts 0(log N) times, each of which costs k 
messages. These broadcasts divide the whole tracking period into 0(log N) rounds, and within each round, 
n stays within a constant factor of n, the broadcast value at the beginning of the round. 

Now, when n < Vk/e, we set p = 1. This causes all the first 0(Vk/e) elements to be sent to the 
coordinator. When n > Vk/e, we set p = 1 / \en / \fk\2, where \_x\2 denotes the largest power of 2 smaller 
than x. Since n is monotonically increasing, p gets halved over the rounds. At the beginning of a round, if 
the new p is half 4 of that in the previous round, each site Si adjusts its n$ appropriately, as follows. First 
with probability 1/2, the site decides if n« remains the same. If so, nothing changes; otherwise, it repeatedly 
flips a coin with probability \/p (with the new p). Every failed coin flip decrements ftj by one. It does so 
until a successful coin flip, or ni = 0. Finally, the site informs the coordinator of the new value of fii, if 
fii = 0, the coordinator will treat it as if fii does not exist. It should be clear that after this adjustment, the 
whole system looks as if it had always been running with the new p. 

It is easy to see that the communication cost in each round is 0(k + pn) = 0(k + Vk/e) = 0(V~k/e), 
thus the total cost is 0{Vk/e ■ log N). 

Theorem 2.1 There is an algorithm for the count-tracking problem that, at any time, estimates n = ^ rij 
within error en with probability at least 0.9. It uses O(l) space at each site and 0(Vk/e ■ log N) total 
communication. 

2.2 The lower bound 

Before proving the lower bounds, we first state our lower bound model formally, in the context of the count- 
tracking problem. The N elements arrive at the k sites in an online fashion at arbitrary time instances. We do 
not allow spontaneous communication. More precisely, it means that a site is allowed to send out a message 
only if it has just received an element or a message from the coordinator. Likewise, the coordinator is 
allowed to send out messages only if it has just received messages from one or more sites. When a site Sj is 
allowed to send out a message, it decides whether it will indeed do so and the content of the message, based 
only on its local counter nj and the message history between Sj and the coordinator, possibly using some 
random source. We assume that the site does not look at the current clock. We argue that the clock conveys 
no information since the elements arrive at arbitrary and unpredictable time instances. (If the elements arrive 
in a predictable fashion, say, one per time step, the problem can be solved without communication al all.) 
Similarly, when the coordinator is allowed to send out messages, it makes the decision on where and what to 
send based only on its message history and some random source. We will lower bound the communication 
cost only by the number of messages, regardless of the message size. 

4 To be more precise, the new p might also be a quarter of the previous p, but it can be handled similarly. 
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2.2.1 One-way communication lower bound 

In this section we show that two-way communication is necessary to achieve the upper bound in Theo- 
rem 2.1, by proving the following lower bound. Remember that we assume N is sufficiently larger than k 
and 1/e. 

Theorem 2.2 If only the sites can send messages to the coordinator but not vice versa, then any randomized 
algorithm for the count-tracking problem that, at any time, estimates n within error en with probability at 
least 0.9 must send £l(k/e • log N) messages. 

Proof. We first define the hard input distribution a. 

(a) With probability 1/2, all elements arrive at one site that is uniformly picked at random. 

(b) Otherwise, the N elements arrive at the k sites in a round-robin fashion, each site receiving N/k 
elements in the end. 

By Yao's Minimax principle [28], we only need to argue that any deterministic algorithm with success 
probability at least 0.8 under a has expected cost Q(k/e ■ log N). 

Note that when only one-way communication is allowed, a site decides whether to send messages to the 
coordinator only based on its local counter rij. Thus the communication pattern can be essentially described 
as follows. Each site Sj has a series of thresholds tj,t 2 ,. .. such that when nj = tj, the site sends the i-th 
message to the coordinator. These thresholds should be fixed at the beginning. 

We lower bound the communication cost by rounds. Let Wi be the number of elements that have arrived 
up until round i. We divide the rounds by setting W\ = k/e, and Wj+i = [(1 + e)Wj] f° r i > 1- Thus 
there are 1/e • \og(eN/k) rounds, which is £1(1 /e • log AT) for sufficiently large N. 

At the beginning of round i + 1, suppose that Si, S2, ■ ■ ■ , Sk have already sent z\, z\, . . . , z l k messages 

to the coordinator, respectively. Let = (1 + e) ■ maxjt/ | j = 1, 2, . . . , k}. We first observe that there 

must be at least k/2 sites with their next threshold t 3 < t l ^ x . Otherwise, suppose there are less than 
k/2 sites with such next thresholds, then with probability at least 1/4 case (a) happens and the random site 

z ij \-l ■ z* 

Sj chosen to receive all elements has t- 3 > tf^ x > (1 + £ )t/- Thus, with probability at least 1/4 the 
algorithm fails when the iJ+^-fh element arrives, contradicting the success guarantee. 

z' 

On the other hand, with probability 1/2 case (b) happens. In this case all t- 3 (j = 1, 2, . . . , k) are no 
more than Wi/k, since in case (b), elements arrive at all k sites in turn. In the next eWi elements, each site 

z ij rl 

Sj receives eWi/k elements. If the site Sj has t- 3 < tf^ x , then it must send a message in this round, 

since Wi/k + eWi/k > t^ x > t 3 , that is, its (z % - + l)-th threshold is triggered. As argued, there are 

> k/2 sites with t- 3 < t l ^ x , so the communication cost in this round is at least k/2. 

Summing up all rounds, the total communication is at least Q(k/e ■ log N). ED 

2.2.2 Two-way communication lower bound 

Below we prove two randomized lower bounds when two-way communication is allowed. The first one 
justifies the assumption k < 1/e 2 , since otherwise, random sampling will be near-optimal. 
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Theorem 2.3 Any randomized algorithm for the count-tracking problem that, at any time, estimates n 
within error O.ln with probability at least 0.9 must exchange Q(k) messages. 

Proof. The hard input distribution is the same as that in the proof of Theorem 2.2. To prove this lower 
bound we are only interested in the number of sites that communicate with the coordinator at least once. 
Before any element arrives, we can still assume that each site keeps a triggering threshold. The thresholds 
of Sj shall remain the same unless it communicates with the coordinator at least once. We argue that there 
must be at least k/2 sites whose triggering threshold is no more than 1, since otherwise if case (a) happens 
and the randomly chosen site is one with a triggering threshold larger than 1, the algorithm will fail, which 
would happen with probability at least 1/4. On the other hand, if case (b) happens, then all the sites with 
threshold 1 will have to communicate with the coordinator at least once: either their thresholds are triggered 
by the round-robin arrival of elements, or they receive a message from the coordinator, which can possibly 
change their threshold. ED 

Finally, we show that the upper bound in Theorem 2.1 is asymptotically tight. We first introduce the 
following primitive problem. 

Definition 2.1 (1-bit) Let s be either k/2 + Vk or k/2 — \fk, each with probability 1/2. From the k sites, 
a subset of s sites picked uniformly at random each have bit 1, while the other k — s sites have bit 0. The 
goal of the communication problem is for the coordinator to find out the value of s with probability at least 
0.8. 

We will show the following lower bound for this primitive problem. 

Lemma 2.2 Any deterministic algorithm that solves 1-bit has distributional communication complexity 
n(k). 

Lemma 2.2 immediately implies the following theorem: 

Theorem 2.4 Any randomized algorithm for the count-tracking problem that, at any time, estimates n 
within error en with probability at least 0.9 must exchange £l(Vk / e ■ log N) messages, when k < l/e 2 . 

Proof. We will again fix a hard input distribution first and then focus on the distributional communication 
complexity of deterministic algorithms with success probability at most 0.8. Let [m] = {0, 1, . . . , m — 1}. 
The adversarial input consists of t = log ^ = SI (log N) rounds. We further divide each round i G [£] into 
r = l/(2eVk) subrounds. 

The input at round % G [£] is constructed as follows, at each subround j G [r], we first choose s to be 
k/2 + \fk or k/2 — \fk with equal probability. Then we choose s sites out of the k sites uniformly at random 
and send 2 l elements to each of them (the order does not matter). 

It is easy to see that at the end of in each subround in round i, the total number of items is no more than 
Tj = Vk/e ■ 2 % . Thus after s ■ 2 % elements have arrived in a subround, the algorithm has to correctly identify 
the value of s with probability at least 0.8, since otherwise with probability at least 0.2 the estimation of the 
algorithm will deviate from the true value by at least \fk ■ 2 l > ETi, violating the success guarantee of the 
algorithm. This is exactly the 1-bit problem defined above. By Lemma 2.2, the communication cost of each 
subround is £l(k). Summing over all r subrounds and then all t rounds, we have that the total communica- 
tion is at least t ■ r ■ n(k) > n(Vk/e ■ log N). EO 
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Now we prove Lemma 2.2. 
Proof, (of Lemma 2.2) First of all, observe that whenever the coordinator communicates with a site, the 
site can send its whole input (i.e., its only bit) to the coordinator. After that, the coordinator knows all the 
information about that site and does not need to communicate with it further. Therefore all that we need to 
investigate is the number of sites the coordinator needs to communicate with. 

There can be two types of actions in the protocol. 

(a) A site initiates a communication with the coordinator based on the bit it has. 

(b) The coordinator, based on all the information it has gathered so far, asks some site to send its bit. 

Note that if a type (b) communication takes place before a type (a) communication, we can always swap 
the two, since this only gives the coordinator more information at an earlier stage. Thus we can assume that 
all the type (a) communications happen before type (b) ones. 

In the first phase where all the type (a) communications happen, let x be the number of sites that send bit 
to the coordinator, and y be the number of sites that send bit 1 to the coordinator. If E[x + y] = £l(k), then 
we are done. So let us assume that E[x + y\ = o(k). By Markov inequality we have that, with probability 
at least 0.9, x + y = o(k). After the first phase, the problem becomes that there are s' = s — y = s — o{k) 
sites having bit 1, out of a total k' = k — x — y = k — o{k) sites. The coordinator needs to figure out the 
exact value of s' with probability at least 0.8 — (1 — 0.9) = 0.7. 

In the second phase where all type (b) communication happens, from the coordinator's perspective, all 
the remaining sites are still symmetric (by the random input we choose), therefore the best it can do is to 
probe an arbitrary site among those that it has not communicated with. This is still true even after the co- 
ordinator has probed some of the remaining sites. Therefore, the problem boils down to the following: The 
coordinator picks z sites out of the remaining k' sites to communicate and then decides the value of s' with 
success probability at least 0.7. We call this problem the sampling problem. We can show that to achieve 
the success guarantee, z should be at least Q(k). This result is perhaps folklore; proofs to more general 
versions of this problem can be found in [4] (Chapter 4), and also [22,27]. We include a simpler proof in 
the appendix for completeness. With this we conclude the proof of Lemma 2.2. EQ 



3 Tracking Distributed Frequencies 

In the frequency-tracking problem, A (we omit "(*)" when the context is clear) is a multiset and the goal is 
to track the frequency of any item j within error en. Let denote the local frequency of element j in Ai, 
and let ft = ££=i/«- 

3.1 The algorithm 

The algorithm with a fixed p As in Section 2.1 we first describe the algorithm with a fixed parameter p. 
If each site tracks the local frequencies fj exactly, we can essentially use the count-tracking algorithm to 
track the ffs. To achieve small space, we make use of the following algorithm due to Manku and Motwani 
[18] at each site We maintain a list L; of counters. When an element j arrives at Si, it first checks if there 
is a counter for j in Lj. If yes, we increase by 1. Otherwise, we sample this element with probability 
p. If it is sampled, we insert a counter c^ , initialized to 1, into Lj. It is easy to see that the expected size of 
U is 0(prii). 
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Next, we follow a similar strategy as in the count-tracking algorithm: The site reports the counter Cy to 
the coordinator when it is first added to the counter list with an initial value of 1. Afterward, for every j that 
is arriving, the site always increments as before, but only sends the updated counter to the coordinator 
with probability p. We use Cjj to denote the last updated value of dj . 

The tricky part is how the coordinator estimates fij, hence fj. Fix any time instance. The difference 
between fij and Cy comes from two sources: one is the number of j's missed before a copy is sampled, and 
the other is the number of j's that arrive after the last update of c^. It is easy to see that both errors follow 
the same distribution as n« — fii in the count-tracking algorithm. Thus it is tempting to modify (1) as 

i _ / s ij - 2 + 2/p, if Cij exists; 

Jij ~\0, else. {Z) 

However, this estimator is biased and its bias might be as large as Q(en/Vk). Summing over k streams, 
this would exceed our error guarantee. To see this, consider the fij copies of j. Effectively, the site samples 
every copy with probability p, while c\j — 2 is exactly the number of copies between the first and the last 
sampled copy (excluding both). We define X\ as before 



Xi 



t\ , if the tith copy is the first one sampled; 

fij + 1 /p, if none is sampled. 



We define X2 in exactly the same way, except that we examine these fij copies backward: 

{t2 , if the ^th copy is the first one sampled 

in the reverse order; 
fij + 1 /p, if none is sampled. 

It is clear that X\ and X2 have the same distribution with E[Xi] = E[A^] = 1/p (by Lemma 2.1), so 
fij = fij - (Xi + X 2 ) + 2/p is unbiased. Since Cij — 2 — fij — t\ — t2, the correct unbiased estimator 
should be 

i f c^ - 2 + 2/p, if c^ exists; 

h3 = { -/., else. (3) 

Compared with the previous wrong estimator (2), the main difference is how the estimation is done when 
no copy of j is sampled. When fij = Q{en/y/k) andp = 6(1//^), this happens with constant probability, 
which would result in a bias of Q(fij) = (en/ Vk). 

However, the correct estimator (3) depends on fij, the quantity we want to estimate in the first place. 
The workaround is to use another unbiased estimator for fij when c\j is not yet available. It turns out that we 
can just use simple random sampling: The site samples every element with probability p (this is independent 
of the sampling process that maintains the list Li), and sends the sampled elements to the coordinator. Let 
dij be the number of sampled copies of j received by the coordinator from site i, the final estimator for fij 
is 

Cij — 2 + 2/p, if Cij exists; 
—dij/p, else. 

Since dij is independent of the estimator is still unbiased. Below we analyze its variance 



Ji * I -dij/p, else. K ' 



Analysis Intuitively, the variance is not affected by using the simple random sampling estimator dij/p, 
because it is only used when c^- is not available, which means that fij is likely to be small, and when fij is 
small, dij/p actually has a small variance. When fij is large, dij/p has a large variance, but we will use it 
only with small probability. Below we give a formal proof. 
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Lemma 3.1 E^] = fa Var^.] = 0(l/p 2 ). 

Proof. We first analyze the estimator of (3). That E[fj] = /jj follows from the discussion above. 
Its variance is Var[/y] = Var[Xi + X 2 ]. Note that X\ and X 2 are not independent, but they both have 
expectation 1/p and variance < 1/p 2 . We first rewrite 

Var[Xi + X 2 ] = E[X 2 + X 2 + 2X ± X 2 ] - E[Xi + X 2 } 2 
= Var[Xi] + E[Xi] 2 + Var[X 2 ] + E[X 2 ] 2 

+2E[X 1 X 2 ] - (E[Xi] + E[X 2 ]) 2 
< 4/p 2 + 2E[X!X 2 ] - 4/p 2 < 2E[XiX 2 ]. 

Let St be the event that the tth copy of j is the first being sampled. We have 

E[X!X 2 ] 

fij 

= £(i - p )*-Ve[i 2 1 ^ + (1 + 1/p) 2 

t=i 

= ^(1 - P )*- v ( (i - p)^-*(^- - * + 1) + Yl c 1 - p)' -1 ^ 

+ (l-p) Aj '(& + l/p) 2 

p 2 J p 

Let c = /jjp. If c < 2, /jj < 2/p, and the variance is 0(l/p 2 ). Otherwise 

E[^X 2 ] < \ + 4- c + 4-c = °(VP 2 ), 
p A p A e c p z e c 

since c 2 < e c when c > 2. 

Next we analyze the final estimator of (4). First, is the sum of Bernoulli random variables 
with probability p, so E[dij/p] = fj and Var [dij/p] < fijp/p 2 = fij/p- Let S* be the event that dij is 
available, i.e., at least one copy of j is sampled, and £q = £*, then 

E[4-] = E[4- |^]Pr[^] + E[-^/p|f ]Pr[fo] 
= E[4-|f,]Pr[f,] + (-^-)Pr[fb] 

The variance is 

?'2l c r£' 12 



Varf/^] = E[$]-E[^ 

= E[/ 2 I S*]Pr[£.] + E[(<%/p) 2 I £ }Pr[£ ] - f 2 

= E[f 2 I ^]Pr[f,] - /g + E[(^/p) 2 ]Pr[£ ] 

= E[/ 2 I £*]Pr[£*] - / 2 + (Var^/p] + f 2 )Pr[£ ] 
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Note that 

Var[jy = E[$]-/g 

= E[f 2 | g m ] Pr [g m ] + E [/2 | £: ]p r [£: ] - /g 

= E[/^.|f,]Pr[f <1 ] + /gPr[fb]-/g, 

Var[4-] = Var[4]+Var[^/p]Pr[f ] 
< Var[4-] + ^-(l-p/-. 

Due to the same reason as above, the second term is 0(l/p 2 ), and the proof completes. ED 



so 



Dealing with a decreasing p As in the count-tracking algorithm, we divide the whole tracking period into 
0(log N) rounds. Within each round, n stays within a constant factor of n, while n remains fixed for the 
whole round. 

Within a round, we set the parameter p for all sites to be p = l/[en/Vk\2- When we proceed to a 
new round, all sites clear their memory and we start a new copy of the algorithm from scratch with the new 
p. Given an item j, the coordinator estimates its frequency from each round separately, and add them up. 
Since the variance in a round is 0(k/p 2 ) and p increases geometrically over the rounds, the total variance is 
asymptotically bounded by the variance of the last round, i.e., 0(1/ e 2 ), as desired. 

The space used at some site could still be large, since the site may receive too many elements in a 
round. If all the 0(n) elements in a round have gone to the same site, the site will need to use space 
0(pn) = 0(V~k/e). To bound the space, we restrict the amount of space used by each site. More precisely, 
when a site receives more than n/k elements, it sends a message to the coordinator for notification, clears 
its memory, and starts a new copy of the algorithm from scratch. The coordinator will treat the new copy as 
if it were a new site, while the original site no longer receives more elements. Now the space used at each 
site is at most pfi/k = 0(1/ (eVk) ) . Since there are at most O ( k) such new "virtual" sites ever created in a 
round, this does not affect the variance by more than a constant factor. 

It remains to show that the total communication cost is 0(\fk/e ■ logN). From earlier we know that 
there are 0(log N) rounds; within each round, n is the same and n stays within 6(n). Focus on one round. 
For each arriving element, the site Si updates c^- with probability p and also independently samples it with 
probability p to maintain dij. This costs 0(n ■ p) = 0(Vk/e) communication. 

Theorem 3.1 There is an algorithm for the frequency -tracking problem that, at any time, estimates the 
frequency of any element within error en with probability at least 0.9. It uses 0(l/(eVk) space at each site 
and 0(Vk/e ■ log N) communication. 

3.2 Space lower bound 

It is easy to see that the communication lower bounds for the count-tracking problem also hold for the 
frequency-tracking problem. In this section, we prove the following space-communication trade-off. 



13 



Theorem 3.2 Consider any randomized algorithm for the frequency-tracking problem that, at any time, 
estimates the frequency of any element within error en with probability at least 0.9. If the algorithm uses 
C bits of communication and uses M bits of space per site, then we must have C ■ M = Q(log N/e 2 ), 
assuming k < 1/e 2 . 

Thus, if the communication cost is C = 0(Vk/e ■ logiV) bits, the space required per site is at least 
fi(l/(e\/fe)) bits, as claimed in Table 1. Note that, however, our algorithm of the previous section uses 
0(\/k/e ■ logiV) words of communication and 0(1/ '(e\/fe)) words of space, so there is still a small gap 
between the lower and upper bound. Interestingly, this lower bound also shows that the random sampling 
algorithm [9] (see Table 1) actually attains the other end of this space-communication trade-off (ignoring 
the word/bit difference). 

Proof, (of Theorem 3.2) We will use a result in [26] which states that, under the k -party communication 
model, there is an input distribution such that, any algorithm that solves the one-shot version of the 
problem under pk with error 2sn with probability 0.9 needs at least c\/k/e bits of communication for some 
constant c, assuming k < 1/e 2 . Moreover, any algorithm that solves i independent copies of the one-shot 
version of the problem needs at least I ■ c\fk/e bits of communication. 

We will consider the problem over pk sites, for some integer p > 1 to be determined later. We divide 
the whole tracking period into log N rounds. In each round % = 1, . . . , log N, we generate an input indepen- 
dently chosen from distribution p p k to the sites. We pick elements from a different domain for every round 
so that we have log TV independent instances of the problem. In round i, for every element e picked from 
p p k for any site, we replace it with 2 i_1 copies of e. We arrange the element arrivals in a round so that site 
Si gets all its elements first, then S2 gets all its elements, and so on so forth. We will only require the contin- 
uous tracking algorithm to solve the frequency estimation problem at the end of each round. Since the last 
round always contains half of all the elements that have arrived so far, the algorithm must solve the problem 
for the elements in each round, namely, log N independent instances of the one-shot problem. By the result 
in [26], the communication cost to solve all these instances of the problem is at least c\fpkje ■ log TV. 

Let Ak be a continuous tracking algorithm over k sites that communicates C bits in total and uses M bits 
of space per site. Below we show how to solve the problem over the pk sites in each round, by simulating the 
fc-site algorithm Ak- In each round, we start the simulation with sites S±, . . . , Sk- Whenever Ak exchanges a 
message, we do the same. When Si has received all its elements, it sends its memory content to Sk+i, which 
then takes the role of Si in the simulation and continue. Similarly, when S2 has received all its elements, it 
sends its memory content to Sk+2, which replaces S2 in the simulation. In general, when Sj is done with all 
its elements, it passes its role to Sj + k- When S p k is done, the simulation finishes for this round. S p k then 
sends a broadcast message and we proceed to the next round. 

Let us analyze the communication cost of the simulation. First, we exchange exact the same messages 
as Ak does, which costs C. We also communicate p(k — 1) memory snapshots and a broadcast message in 
each round, which costs < pkM log N over all rounds. Thus, we have 



C + pkM log N > c V pk /e ■ log N. 

Rearranging, 



c C 1 fc C 

M > 



e\fpk pklogN xfpk \e \Tpk\ogN 



Thus, if we set ^fp 



2Ce 



c\fk log N 



, then 



M > -= = tt 



leyfpk V Ce 2 



logiV 
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as claimed. 
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4 Tracking Distributed Ranks 

On a stream of n elements, an algorithm that produces an unbiased estimator for any rank with variance 
0((sn) 2 ) was presented in [24], which has been very recently improved and made to work in a stronger 
model [1]. It uses 0(l/e ■ log L5 (l/e)) working space to maintain a rank estimation summary structure of 
size 0(1/ e). We call this algorithm A and will use it as a black box in our distributed tracking algorithm. 



The overall algorithm As before, with O(klogN) communication, we first track n, a constant factor 
approximation of the current n. This also divides the tracking period into O(logiV) rounds. The 0(n) 
elements arriving in a round are divided into chunks of size at most n/k, each processed by an instance of 
algorithm C, described below. A site may receive more than n/k elements. When the (n/k + l)th element 
arrives, the site finishes the current instance of C, and starts a new one, which will process the next n/k 
elements, and so on so forth. 



Algorithm C Algorithm C reads at most n/k elements, and divides them into blocks of size b = en/Vk, 
so there are at most blocks. We build a balanced binary tree on the blocks in the arrival order, and the 

height of the tree is h < log For each node v in the tree, let D(v) be all the elements contained in 
the leaves in the subtree rooted at v. For each D(v), we start an instance of A, denoted as A v , to process 
its elements as they arrive. We say that v is active if A v is still accepting elements. For a node v at level £ 
(the leaves are said to be on level 0), the error parameter of A v is set to 2~ £ / \fh. We say v is full if all the 
elements in D(v) have arrived. When v is full, we send the summary computed by A v to the coordinator, 
and free the space used by A v . Furthermore, for each element that is arriving, we sample it with probability 
p = , and if it is sampled, we send it to the coordinator. 

Analysis of costs We first analyze the various costs of C. At any time there are at most h active nodes, one 
at each level, so the space used by C is at most 



The communication for C includes all the summaries computed, and the elements sampled. For each t, 
the total size of the summaries on level t is 



O (jj=T l ■ tfVh) = o 



:Vk 



Summing over all h levels, it is ^i. There are at most 2k instances of C in a round, therefore the total 

communication cost in a round is 0(/i 15 v / fe/e)- The number of sampled elements in a round is 0(np) = 
0(Vk/e). Thus, over all 0(log N) rounds, the total communication cost is 0(/i 15 A/fe/e • log N). 
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Estimation It remains to show how the coordinator estimates the rank of any given element x at any time 
with variance 0((en) 2 ). We decompose all n elements that have arrived so far into smaller subsets, and 
estimate the rank of x in each of the subsets. Since all estimators are unbiased, the overall estimator is also 
unbiased; the variance will be the sum of all the variances. 

We will focus on the current round; all previous rounds can be handled similarly. Recall that there are 
0(n) elements arriving in this round and n = 0(n). Every chunk of n/k elements are processed by one 
instance of C. Consider any such chunk. Suppose up to now, n' elements in this chunk have arrived for some 
n' < n/k. We write n' as n' = q ■ b + r for some r < b, and decompose these n' elements into at most 
h + 1 subsets. The first qb elements are decomposed into at most h subsets, each of which corresponds to 
a full node in the binary tree of C. The node has already sent its summary to the coordinator, which we can 
use to estimate the rank. For a node at level £, the variance is (2~ 1 / \fh ■ 2 % b) 2 = b 2 /h, so the total variance 
from all h nodes is b 2 . 

For the last r elements of the chunk that are still being processed by an active node, the coordinator 
does not have any summary for them. But recall that the site always samples each element with probability 
p = \fkj (en) and sends it to the coordinator if it is sampled. Thus, the rank of x in these r elements can be 
estimated by simply counting the number c of elements sampled that are smaller than x, and the estimator is 
c/p. The variance of this estimator is r/p < b/p = b 2 . Thus, the variance from any chunk is 0(b 2 ). Since 
there are at most 2k chunks in the round, the total variance is 0(b 2 k) = 0((en) 2 ) = 0((en) 2 ). As the 
variances of the previous rounds are geometrically decreasing, the total variance from all the rounds is still 
bounded by 0((en) 2 ), as desired. 

Theorem 4.1 There is an algorithm for the rank-tracking problem that, at any time, estimate the rank of 
any element within error en with probability at least 0.9. It uses O (^~^ log 1 ' 5 \ log ' 5 space at each 

site with communication cost O log N log 1 ' 5 ■ 
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A Lower bound for the sampling problem 

Claim A.l To solve the sampling problem we need to probe at least Q(k) sites. 

Proof. Suppose that the coordinator only samples z = o(k) sites. Let X be the number of sites that 
are sampled with bit 1. Then X is chosen from the hypergeometric distribution with probability density 
function (pdf) Pr[X = x] = ( S J) The expected value of X is § ■ s', which is ^ (§ - y + Vkj 

or F (l ~~ y ~ ^^)> depending on the value of s'. Let p = (§ — y) /k' = \ ± o(l) and a = \fkjk' = 
1/Vk ± o(l/ \fk). To avoid tedious calculation, we assume that X is picked randomly from one of the 
two normal distributions J\fi(fi±, o\ ) and c|) w i tn e q ua l probability, where p,\ = z(p — a), P2 = 

z(p + a), &\, 02 = @(\/zp(l — p)) = Q(y/z). In Feller [15] it is shown that the normal distribution 
approximates the hypergeometric distribution very well when z is large and p ± a are constants in (0, 1) 5 . 

5 In Feller's book [15] the following is proved. Let p £ (0, 1) be some constant and q = 1 — p. The population size is iV and the 
sample size is n, so that n < N and Np, Nq are both integers. The hypergeometric distribution is P(k; n, N) = (^! 9 f .)/(^) 
for < k < n. 
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Now our task is to decide from which of the two distributions X is drawn based on the value of X with 
success probability at least 0.7. 

Let f±(x; fii, of) and f2(x; fi2, be the pdf of the two normal distributions J\fi,N 2 , respectively. It is 
easy to see that the best deterministic algorithm of differentiating the two distributions based on the value of 
a sample X will do the following. 

• If X > xq, then X is chosen from A/2, otherwise X is chosen from Ai, where xq is the value such 
that/i^o;^^ ) = f 2 {xa;n2,ol) (thus /ii < x < fi 2 ). 

Indeed, if X > xq and the the algorithm decides that "X is chosen from A/i", we can always flip this 
decision and improve the success probability of the algorithm. 

The error comes from two sources: (1) X > xq but X is actually drawn from A/2; (2) X < xq but X is 
actually drawn from Ai . The total error is 

l/2-($(-V<7i) + $(-V<72)), 

where £\ = xq — m and £2 = ^2 — xo. (Thus l\ +£2 = ^2 — A*i = 2az). <£(•) is the cumulative distribution 
function (cdf) of the normal distribution. See Figure 1. 

Finally note that £\jo\ = 0(az/^/z) = 0{y/z/k) = o(l) and £2/^2 = 0(az/y/z) = o(l), so 
&(—£i/<ti) + £2/(^2) > 0.99. Therefore, the failure probability is at least 0.49, contradicting our suc- 
cess probability guarantee. Thus we must have z = U(k). EQ 



Theorem A.l [15] If N — > 00, n — > 00 so that n/N -> i £ (0, 1) and x k := (k — np) /^/npq — > x, then 

e -x 2 /2(l-t) 

p(k; n, N) ~ = 
yj2impq{l — t) 
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